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CHANGE-POINT DETECTION UNDER DEPENDENCE BASED ON 

TWO-SAMPLE U-STATISTICS 

HEROLD DEHLING, ROLAND FRIED, ISABEL GARCIA, AND MARTIN WENDLER 

Abstract. Wc study the detection of change-points in time series. The classical CUSUM 

statistic for detection of jumps in the mean is known to be sensitive to outliers. We thus 

f^ , propose a robust test based on the Wilcoxon two-sample test statistic. The asymptotic 

OQ ' distribution of this test can be derived from a functional central limit theorem for two- 

;_( , sample U-statistics. We extend a theorem of Csorgo and Horvath to the case of dependent 

O-c data. 
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1. Introduction 



Change-point tests address the question whether a stochastic process is stationary during 
the entire observation period or not. In the case of independent data, there is a well-developed 
rS • theory; see the book by Csorgo and Horvath (1997) for an excellent survey. When the data 

c^ ! are dependent, much less is known. The CUSUM statistic has been intensely studied, even 

for dependent data; see again Csorgo and Horvath (1997). The CUSUM test, however, is 
not robust against outliers in the data. In the present paper, we study a robust test which is 
based on the two-sample Wilcoxon test statistic. Simulations show that this test outperforms 
the CUSUM test in the case of heavy-tailed data. 

In order to derive the asymptotic distribution of the test, we study the stochastic process 

[nX] n 

(1) Yl E h{X,,Xj), 0<A<1, 

i=l j=[nX]+l 

where /z : M^ — )■ M is a kernel function. In the case of independent data, the asymptotic 
distribution of this process has been studied by Csorgo and Horvath (1988). In the present 
paper, we extend their result to short range dependent data (Xj)j>i. Similar results have 
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been obtained for long range dependent data by Dehling, Rooch and Taqqu (2012), albeit 
with completely different methods. 

U-statistics have been introduced by Hoeffding (1948), where the asymptotic normality 
was established both for the one-sample as well as the two-sample U-statistic in the case of 
independent data. The asymptotic distribution of one-sample U-statistics of dependent data 
was studied by Sen (1963, 1972), Yoshihara (1976), Denker and Keller (1983, 1985) and by 
Borovkova, Burton and Dehling (2001) in the so-called non-degenerate case, and by Babbel 
(1989) and Leucht (2012) in the degenerate case. For two-sample U-statistics, Dehling and 
Fried (2012) established the asymptotic normality of Y^ili Yl^^^+i h{Xi,Xj) for dependent 
data, when ni,n2 — )■ oo. The main theoretical result of the present paper is a functional 
version of this limit theorem. 

In our paper, we focus on data that can be represented as functionals of a mixing process. 
In this way, we cover most examples from time series analysis, such as ARMA and ARCH 
processes, but also data from chaotic dynamical systems. For a survey of processes that have 
a representation as functional of a mixing process, see e.g. Borovkova, Burton and Dehling 
(2001). Earlier references can be found in Ibragimov and Linnik (1970) and Billingsley 
(1968). 

2. Definitions and Main Results 

Given the samples Xi, . . . , X^ and Yi, . . . , Yn, and a kernel h{x, y), we define the two- 
sample U-statistic 

-, rii n2 

Uni,n2 '■= / ^ / ^ h{^i^ ^j)- 

ni no ^-^ ^-^ 

More generally, one can define U-statistics with multivariate kernels h{xi, . . . ,Xk,yi, ■ ■ ■ ,yi)- 
In the present paper, for the ease of exposition, we will restrict attention to bivariate kernels 
h{x,y). The main results, however, can easily be extended to the multivariate case. 

Assuming that (Xj)j>i and (Vi)j>i are stationary processes with one-dimensional marginal 
distribution functions F and G, respectively, we can test the hypothesis H : F = G using 
the two-sample U-statistic. E.g., the kernel h{x,y) = y — x leads to the U-statistic 



^ "1 "2 -,112 -. ni 

Uni,n2 = — y ^y s^j ~ ^i) ^ — y ^^j — y ^^^ 



1=1 j=i j=i 1=1 

and thus to the familiar two-sample Gaufi-test. Similarly, the kernel h{x,y) = l{x<y} leads 
to the U-statistic 

-, ni 112 

Uni,n2 = y_^ y_^ l{X,<Xj}, 

1=1 j=l 

and thus to the 2-sample Mann- Whitney- Wilcoxon test. 

In the present paper, we investigate tests for a change-point in the mean of a stochastic 
process (Xj)j>i. We consider the model 

Xi = iJ^i + ^i, i> 1, 

where (/ii)i>i are unknown constants and where (^i)i>i is a stochastic process. We want to 
test the hypothesis 

H : Hi = . . . = Hn 
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against the alternative 

A : There exists 1 < k < n — 1 such that /ii = . . . = /i^ 7^ fik+i = ■ ■ ■ = fin- 
Tests for the change-point problem are often derived from 2-sample tests applied to the sam- 
ples Xi, . . . ,Xk and Xk+i, . . . , Xn, for all possible 1 < k < n — 1. For two-sample tests based 
on U-statistics with kernel h{x,y), this leads to the test statistic ^j=i ^j^fc+i ^(-^i,-^j), 
1 < k < n, and thus to the processes 

[nX] n 

(2) f/„(A) = ^ Yl h{X„X^),0<X<l. 

i=l j=[nX] + l 

In this paper, we will derive a functional limit theorem for the processes (?7n(A))o<A<i- 
Specifically, we will show that under certain technical assumptions on the kernel h and on 
the process (Xj)j>i, a properly centered and renormalized version of (f/n(A))o<A<i converges 
to a Gaussian process. 

In our paper, we will assume that the process (^j)i>o is weakly dependent. More specifically, 
we will assume that {^i)i>o can be represented as a functional of an absolutely regular process. 

Definition 2.1. (i) Given a stochastic process {Xn)n£Z we denote by Af the a— algebra 
generated by {X^, ■ ■ ■ ,Xi). The process is called absolutely regular if 

(3) Pik) = sup I sup ^ ^ \P{A, n B,) - P{A,)P{B,) | I ^ 0, 

where the last supremum is over all finite A^— measurable partitions (Ai,...,A/) and all 
finite A^j^i^— measurable partitions (i?i, . . . , Bj). 

(a) The process is called strongly mixing if 

(4) a{k) = snp {\P{ An B) - P (A) P{B)\ \A e A'l, B G A^^^. n e N} ^ 0. 

(Hi) The process {Xn)n>i is called a two-sided functional of an absolutely regular sequence 
if there exists an absolutely regular process {Zn)nez o-nd a measurable function f : M^ — )> M 
such that 

^i = f{{Zi-\-n)n&Z)- 

Analogously, (X„,)„>i is called a one-sided functional if Xi = /((^j-f„)n>o)- 

(iv) The process {Xn)n>i is called 1- approximating functional with coefficients {ak)k>i if 

(5) E \Xi — E{Xi\Zi_k, . . . , Zi^k)\ < Ofc 

In addition to weak dependence conditions on the process (Xj)j>i, the asymptotic analysis 
of the process ([2]) requires some continuity assumptions on the kernel functions h{x,y). We 
use the notion of 1-continuity, which was introduced by Borovkova, Burton and Dehling 
(2001). Alternative continuity conditions have been used by Denker and Keller (1986). 

Definition 2.2. The kernel h{x,y) is called 1-continuous, if there exists a function : 
(0, oo) -^ (0, oo) with 0(e) = o(l) as e -^ such that for all e > 

(6) E{\h{X',Y) - h{X,Y)\{^x-x'\<e}) < (t^ie) 

(7) Ei\h{X,Y')-hiX,Y)\{\Y^y,\<,})<<P{e) 

for all random variables X,X',Y and Y' having the same marginal distribution as X. 



4 H. DEHLING, R. FRIED, I. GARCIA, AND M. WENDLER 

The most important technical tool in the study of U-statistics is Hoeffding's decomposition, 
originally introduced by Hoeffding (1948). We write 

(8) h{x,y) = e + hi{x) + h2{y)+ gix,y), 

where the terms on the right-hand side are defined as follows: 

e = Eh{X,Y) 

hi{x) = Eh{x,Y)-e 

h^iv) = Eh{X,y)-e 

g{x,y) = h{x,y)-hi{x)-h2{y)-9. 

Here, X and Y are two independent random variables with the same distribution as Xi. 
Observe that, by Fubini's theorem, 

E{h^{X)) = E{h2{X))=0. 

In addition, the kernel g{x, y) is degenerate in the sense of the following definition. 

Definition 2.3. Let (Xj)j>i be a stationary process, and let g{x,y) be a measurable function. 
We say that g{x,y) is degenerate if 

(9) E{g{x,X,))=E{g{X,,y)) = 0, 
for all x,y ^W. 

The following theorem, a functional central limit theorem for two-sample ?7-statistics of 
dependent data, is the main theoretical result of the present paper. 

Theorem 2.4. Let (X„)„>i be a 1- approximating functional with constants (afc)fc>i of an 
absolutely regular process with mixing coefficients {f3{k))k>i, satisfying 

oo 

(10) Y.k\/3{k) + ^+<Piak))<oo, 

fc=i 

and let h{x,y) be a 1-continuous bounded kernel. Then, as n ^ oo, the D[0, l]-valued process 

-. [^"] n 

(11) ^-W:=;^E E (M^.,^.)-^), o<A<i, 

i=l j=[Xn]+l 

converges in distribution towards a mean-zero Gaussian processes with representation 

(12) Z{X) = (1 - A)W^i(A) + X{W2{1) - W2{X)), < A < 1, 

where {Wi{X),W2{X))o<x<i is a two-dimensional Brownian motion with mean zero and co- 
variance function Cov (Wk{s),Wi(t)) = m.m{s,t)aki, where 

oo 

(13) aki = E{hk{Xo)hi{Xo)) + 2 ^ Cov(/ifc(Xo), hi{X,)), k,l = l, 2. 

i=i 

Remark 2.5. (i) In the case of i.i.d. data. Theorem 12.41 was established by Csorgo and 
Horvath (1988). In the case of long-range dependent data, weak convergence of the process 
(T„(A))o<A<i has been studied by Dehling, Rooch and Taqqu (2013) and by Rooch (2012), 
albeit with a normalization different from n^^"^. 
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(ii) Using the representation f lT2l) . one can calculate the autocovariance function of the pro- 
cess (Z(A))o<A<i- We obtain 

(14) Cov(Z(A), Z(/i)) = (Tii[(l - A)(l - /i) min{A,^}] + a22[A^(l - /i - A + min{A, /i})] 

+ ai2[/i(l - A)(A - min{A,/i}) + A(l - fi){fi - min{A,/i})]. 

(iii) For the kernel h{x, y) = y — x, vfe can analyze the asymptotic behavior of the process 
T„(A) using the functional central limit theorem (FCLT). Note that, since Xj — Xi = [Xj — 
E{Xj)) — {Xi — E{Xi)), we may assume without loss of generality that X^ has mean zero. 
Then we get the representation 

i=l j=ln\]+l 

(15) = M 1 ^^. 1 5^x.. 






Thus, weak convergence of (Tn(A))o<A<i can be derived from the FCLT for the partial sum 
process ^ X]l=i ^«- Such FCLTs have been proved under a wide range of conditions, e.g. 
for functionals of absolutely regular data. 

We finally want to state an important special case of Theorem 12.41 namely when the 
kernel is anti-symmetric, i.e. when h{x,y) = —h{y,x). Kernels that occur in connection 
with change-point tests usually have this property. For anti-symmetric kernels, the limit 
process has a much simpler structure; moreover one can give a simpler direct proof in this 
case. 

Theorem 2.6. Let (X„)„>i be a 1- approximating functional with constants {ak)k>i of an 
absolutely regular process with mixing coefficients {(3{k))k>i, satisfying [W\) . and leth{x,y) be 
a 1-continuous bounded anti- symmetric kernel. Then, as n ^ oo, the -D[0, l\-valued process 



(16) T„(A):=^^^ Y. (M^.,^.)-^), 0<A<1, 

i=l i=[An]+l 

converges in distribution towards the mean-zero Gaussian process aW^'^^X)^ < A < 1, 
where (W^°(A))o<a<i is a standard Brownian bridge and 

oo 

(17) a"" = Var(/ii(Xi)) + 2 Y, Cov(/ii(Xi), /ii(Xfe)) 

i=2 



3. Application to Change Point Problems 

In this section, we will apply Theorem 12.41 in order to derive the asymptotic distribution 
of two change-point test statistics. Specifically, we wish to test the null hypothesis 

(18) Ho : Hi = ... = fin 

against the alternative of a level shift at an unknown point in time, i.e. 

(19) Ha : fii = . . . = fik 7^ /"fc+i = • • • = /^n, for some fc G {1, . . . , n - 1}. 



6 H. DEHLING, R. FRIED, I. GARCIA, AND M. WENDLER 

We consider the following two test statistics, 



(20) 

(21) 



■l,n 



Ts 



2,n 



max 

l<k<n 



max 

l<fc<n 



i=l j=k+l 

^ k n 



i=l j=k+l 



Theorem 3.1. Let {Xn)n>i be a 1- approximating functional with constants {ak)k>i of an 
absolutely regular process with mixing coefficients {/3{k))k>i, satisfying / fJ^) . and assume that 
Xi has a distribution function F{x) with bounded density. Then, under the null hypothesis 
Ho, 



Ti^n ^ ai sup \W^^\\)\ 

0<A<1 



(22) 

(23) T2,„ ^ ^2 sup |H^(°)(A)| 

0<A<1 



where {W^'^\\))o<<x<i denotes the standard Brownian bridge process, and where 

oo 

(24) al = Var(F(Xi)) + 2 5^Cov(F(Xi),F(X,)) 

fc=2 

oo 

(25) al = Var(Xi) + 2 ^Cov(Xi,Xfc). 



fc=2 



Proof. We will establish weak convergence of Ti^n- In order to do so, we will apply 
Theorem 12.41 to the kernel h{x,y) = l{x<i/}- Borovkova, Burton and Dehling (2001) showed 
that this kernel is 1-continous. By continuity of the distribution function of Xi, we get that 
9 = P{X <Y) = 1/2. Moreover, we get 



hi{x) 
h2(x) 



P(x < Xi] 



1 _ 1 

2 ~ 2 
1 



F{x) 



P(Xi <x)-- = F{x) - -. 



Note that h2{x) = —hi{x). Hence W2{X) = —Wi{X), and thus the limit process in Theo- 
rem [2]3] has the representation 



Z(A) 



\)Wi{\) + A(Vr2(l) - W2i\)) = Wi{\) - \Wi{l). 



Here lyi(A) is a Brownian motion with variance af. Weak convergence of T2^„ can be 
shown directly from the functional central limit theorem for the partial sum process; see 
e.g. BiUingsley (1968). D 

Remark 3.2. (i) The distribution of supo<A<i I W^('^) I is the well-known Kolmogorov-Smirnov 
distribution. Quantiles of the Kolmogorov-Smirnov distribution can be found in most sta- 
tistical tables, 
(ii) In order to apply Theorem 13. H we need to estimate the variances al and a^. Regarding 
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a2 given in expression (|25|) . we apply the non-overlapping subsanipling estimator 
(26) 

investigated by Carlstein (1986) for a-mixing data. In case of AR(l)-processes, Carlstein 
derives 

(27) L = me.x{\n'/'{2p/{l-p')Y/'],l) 

as the choice of the block length which minimizes the MSE asymptotically, with p being the 
autocorrelation coefficient at lag 1. 

Regarding af given in flMj) . one faces the additional challenge that the distribution function 
F is unknown. This problem has been addressed, e.g. in Dehling, Fried, Sharipov, Vogel and 
Wornowizki (2013), for the case of functionals of absolutely regular processes and F being 
estimated by the empirical distribution function F„. The authors find the subsampling 
estimator for af 



P«' *' - Ry Vi 5 VI 



IT ST^ 1 



i=l 



2tri 7 



j = {i-l)ln + l i = l 



employing non-overlapping subsampling to give smaller biases, but somewhat larger MSEs 
than the corresponding overlapping subsampling estimator. The adaptive choice of the block 
length /„ proposed by Carlstein worked well in their simulations if the data were generated 
from a stationary ARMA(1,1) model and an estimate of p was plugged in. In the next 
section, we will explore this and other proposals in situations with level shifts and normally 
or heavy-tailed innovations. 

4. Simulation Results 

The assumptions regarding the underlying process (Xj) in Theorem 12.41 are satisfied by a 
wide range of time series, such as AR and ARMA models. To illustrate the results and to 
investigate the finite sample behavior and the power of the tests based on Ti^„ and T2^n, we 
will give some simulation results. We study the underlying change-point model 

(29) X.= | 5, V-^l'Wl^'!^^ 

^ ^ ' \ fi + ^i iU= [n\] + l,...,n. 

Within this model, the hypothesis of no change is equivalent to /i = 0. We assume that the 
noise follows an AR(1) process, i.e. that 

(30) ^i = p^i^i + eu 

where — 1 < p < 1, and where the innovations ^i are i.i.d. random variables with mean 
zero. The innovations ^j are generated from a standard normal or a tj^-distribution with 
u = 3 degrees of freedom, scaled to have the same 84.13% percentile as the standard normal, 
which is 1. The autoregression coefficient is varied in p = {0.0,0.4,0.8}, corresponding to 
zero, moderate or strong positive autocorrelation, and the sample size is n = 200. For 
the choice of the block length we used Carlstein's adaptive rule outlined above, or a fixed 
block length of Z„ = 9, which is in good agreement with the empirical findings of Dehling 
et al. (2013) for larger sample sizes and their theoretical result that /„ should be chosen as 
o{^/n) to achieve consistency. For the reason of comparison we also included tests employing 
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overlapping subsampling for estimation of the asymptotical variance, applying the same 
block lengths as the non-overlapping versions. 

Table [1] contains the empirical levels (i.e. the fraction of rejections) of the tests with 
an asymptotical level of 5%, obtained from 4000 simulation runs for each situation. Note 
that the tests developed under the assumption of independence, which do not adjust for 
autocorrelation, become strongly oversized with an increasingly positive autocorrelation, i.e. 
they reject a true null hypothesis by far too often and are practically useless already for 
p = 0.4. The performance of the adjusted tests is much better in this respect and in a good 
agreement with the asymptotical results. Only if the autocorrelation is strong (p = 0.8), 
the tests with a fixed block length become somewhat anti-conservative (oversized), and 
even more so for the CUSUM-test. Longer block lengths are needed for stronger positive 
autocorrelations, and Carlstein's adaptive block length (1271) adjusts for this. There is little 
difference between the tests employing overlapping and non-overlapping subsampling here. 









Ti,„ 








T2,n 








unadj . 


/„ fixed 


adaptive 


unadj . 


In fixed 


adaptive 


u 


P 




ol 


nol 


ol 


nol 




ol 


nol 


ol nol 


oo 


0.0 


2.8 


2.0 


2.9 


2.0 


2.2 


4.5 


2.9 


3.9 


3.7 3.8 


CX) 


0.4 


24.5 


2.5 


3.1 


3.5 


3.9 


34.2 


3.9 


4.9 


5.5 6.0 


oo 


0.8 


81.6 


6.2 


6.5 


1.9 


2.5 


91.5 


10.5 


10.6 


3.4 4.0 


3 


0.0 


3.1 


2.2 


2.9 


2.2 


2.9 


3.8 


2.5 


3.5 


3.1 3.1 


3 


0.4 


26.9 


2.4 


3.0 


3.2 


3.0 


32.0 


3.3 


3.8 


4.3 4.9 


3 


0.8 


82.7 


6.9 


7.0 


2.0 


2.8 


90.6 


10.2 


10.5 


3.2 3.9 



Table 1. Empirical level of the tests based on Ti^„ and T2,„, for n = 200, 
with fixed or adaptive subsampling block length In and overlapping (ol) or non- 
overlapping (nol) subsampling. The results are for AR(1) observations with 
different lag-one autocorrelations p and different t^-distributed innovations, 
and based on 4000 simulation runs each. 



In order to investigate the powers of the tests under the alternative, a change in the mean, 
we consider shifts of increasing height p, generating 400 data sets for each situation. The 
sample size is again n = 200 and the change point is after observation number r = [An] = 100. 

Figure [1] illustrates the powers of the different versions of the tests in case of Gaussian 
or ts-distributed innovations and several autocorrelation coefficients p. Under normality, 
the CUSUM test T2^„ is somewhat more powerful than the test Ti „ based on the Wilcoxon 
statistic, while under the ts-distribution it is the other way round. The CUSUM test with the 
fixed block length considered here becomes strongly oversized if p is large, while this effect 
is less severe for the test based on the Wilcoxon statistic. Carlstein's adaptive choice of the 
block length increases the power if p is small and improves the size of the test substantially 
if p is large. The tests employing overlapping subsampling (not shown here) perform even 
slightly more powerful in case of zero or moderate autocorrelations, but much less powerful 
in case of strong autocorrelations. 

The tests with Carlstein's adaptive choice of the block length could be improved further 
by using a more sophisticated estimate of p than the ordinary sample autocorrelation used 
here. The latter is positively biased in the presence of a shift, which leads to too large choices 
of the block length. This negative effect becomes more severe for larger values of p, since 
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Figure 1. Power of the tests in case of a shift in the middle of an AR(1) 
process with Gaussian (left) and ta-innovations (right) and different lag one 
correlations p = 0.0 (top), p = 0.4 (middle) or p = 0.8 (bottom), n = 200. 
Wilcoxon test T„ i (bold lines) and CUSUM test Tn^2 (thin lines). Adjustment 
by non-overlapping subsampling with fixed (black) or adaptive block length 
(grey). 









0.0 0.5 1.0 1.5 

height 
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the plug-in-estimate of the asymptotically MSE-optimal choice of /„ increases more rapidly 
if p is close to 1, while it is rather stable for moderate and small values of p. In our study, 
for p = the average value chosen for In increases from about 2 to about 3, only, as the 
height of the shift increases, while it is from about 6 to about 9 if p = 0.4, and even from 
about 16 to about 24 if p = 0.8. An estimate of the autocorrelation coefficient which resists 
shifts could be used, e.g. by applying a stepwise procedure which estimates the possible time 
of occurrence of a shift before calculating p from the corrected data, but this will not be 
pursued here. 

5. Auxiliary Results 

In this section, we will prove some auxiliary results which will play a crucial role in the 
proof of Theorem 12.41 The main result of this section is the following proposition, which 
essentially shows that the degenerate part in the Hoeffding decomposition of the U-statistic 
T„(A) is uniformly negligible. 

Proposition 5.1. Let {Xn)n>i be a 1- approximating functional with constants {ak)k>i of an 
absolutely regular process with mixing coefficients {P{k))k>i, satisfying 

oo 

(31) ^A;(/3(A;) + v/^ + 0(afc))<oo. 

fc=i 
Moreover, let g{x,y) be a 1-continuous bounded degenerate kernel. Then, as n ^ cxd, 

[n\] n 

i=l j=[nA]+l 

in probability. 

The proof of Proposition 15.11 requires some moment bounds for increments of U-statistics 
of degenerate kernels, which we will now state as separate lemmas. 

Lemma 5.2. Let {Xn)n>i be a 1- approximating functional with constants {ak)k>i of an 
absolutely regular process with mixing coefficients {P{k))k>i, satisfying 

oo 

(33) EA;(/3(A;) + v^ + 0(afc))<oo. 

fc=i 

Moreover, let g{x,y) be a 1-continuous bounded degenerate kernel. Then, there exists a 
constant Ci such that 

(34) ^ E E 9{X,,X,)\ <C^[nX]{n-[nX]). 

\i=l j=[nX]+l J 

Proof. We can write 

([nX] n \ [n\] n 

E E 9{X,^^,)\ =E E E{g{X,,X,)f 
i=l j=[n\]+l J i=l j=[n\]+l 

+ 2 E E E{g{X,,,X,MX^,.X,,)) 

l<h^i2<[n\] \n\]+l<'ji^J2<n 



(32) ^rj^ sup 

n-^'^ o<A<i 
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The elements of the first sum all are bounded, hence 

[nX] n 

(36) Y. E Eig{X,,X,))'<C[nX]{n-[nX]). 

i=l j=[n\]+l 



Concerning the second sum, by Lemma 17.61 we get 

^ J2 Eig{X,,,X,Jg{X,,,X,,)) 

l<ii<i2<[n\] [nA]-H<ji<J2<n 

<45 E E <^K-/3]) 

l<Ji<J2<[r4A] [r4A]-|-l<ii<J2<n 

(37) +8S' Yl E iv^wi+m/m 

l<ii<i2<:[n\] [n\]+l<ji<J2<:'>i 

with k = max{|z2 — ii\, \J2 ^ji\}- We will first treat the summands with k = i2 — ii- Suppose 
for one moment that k is fixed and we will bound the number of indices that appear in the 
sum. Observe that in this case we have [nX] ways to choose ii, once ii is chosen we have one 
way to pick ^2 because 12 = ii + k. For ji we have as before n — [nX] ways to pick this index 
and then for each ji, J2 need to be in the interval [ji, ji + k] and there are exactly k integers 
in such interval. 

(38) E E {^sct^iHm + 8^'v/^i^ + ^s^m/m) 

l<n<i2<[nA] [nA]-f l<ji<j2<n 



n n 



< C[nX]{n - [nX]) ^ A;0(afc) + ^ A;v/^+ ^ A:/3(A;) < C[nX]{n - [nX]) 



,fc=l k=l fc=l 



Analogously we can find the bounds for the terms with k = ii — i2, k = J2 —ji and k = ji — j2 
using the conditions of summability. D 

We now define the process G{X), < A < 1, by 

[nX] n 

(39) Gn{X):=n-^'^Y E ^(^-^^O' < A < 1. 

i=l i=[nA]-H 

Lemma 5.3. Under the conditions of Lemnia \5.^ there exists a constant C such that 

(40) E{\GM-GM?)<%-1^). 

n 

for all < fi < r] < 1 . 
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Proof. We can write 

(41) E{\GM - Gni^^)\') 

\«=lj=M+l / \i=lnfj,]+l j=[nr,] + l 

/ [nti] [nr,] \ /[nT?]-[7i^] »i-[nAj] 



n 



\«=i j=M+i 



n-" 



i=l j=[nrj\-[n^l]+l 



1 c 

< C— {\n^i]{\nri\ - [n/i]) + ([nr/] - [n^i]){n - [nr/])) < — (r/ - /x) 



n-' 



n 



using the stationarity of the process {Xn)n<m and Lemma [5.21 



D 



Proof of Proposition \5.1\ From Lemma 15.31 we obtain, using Chebyshev's inequahty, 



(42) 



P{\GM-GM\>e)<-%-^^) 

e'^ n 



for all e > 0. Thus we get for < /c < rra < n with k,m,n &N 



Gn ] — Gn — 



n 



n 



(43) 



>,| < iB(G„(™)^G„(* 

e^ \ Vn/ \n 



e^ n^ 



(Vi 0/ o 



e^ n 



a.s m — k < n. Now consider the variables 
(44) 



A = / G'n (^) - G'n (^) if^ = l,...,n-l 

^' \ else 



and suppose that Si = (i + (2 + ■ ■ ■ + Ci with Sq = 0, then Si = Gn{-)- In consequence the 
inequality fH3|l is equivalent to 



(45) 



P(|5™-^fc|>e)< 



5/4 



n' 



(m — k) 



4/3 



for < k < m < n. 



So the assumption of Theorem 17.71 are satisfaced with the variables (l44l) in the role of the ^j 
/5 = 1/2, a = 2/3 and m = G^/^/n^/\ Uo = and hence 



(46) 



K 

P ( max l^il > e 1 < ^r 

l<j<n-l 



^3/4 



n 



5/4 



fn-ll 



4/3 



< 



KG 

g2j^l/3 



where if depends only of a and /3. Thus, fl32|) holds as n, — )■ 00. 



D 



6. Proof of Main Results 



In this section, we will prove Theorem 12.41 and Theorem 12.61 Note that Theorem 12.61 is a 
direct consequence of Theorem 12.41 applied to anti-symmetric kernels. We will nevertheless 
present a direct proof of Theorem 12.61 since this proof is much simpler than the proof in the 
general case. Moreover, Theorem l2.6l covers those cases that are most relevant in applications. 
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The first part of the proof is identical for both Theorem 12.41 and Theorem I2.6[ Note that, 
for each A G [0, 1], the statistic T„(A) is a two-sample U-statistic. Thus, using the Hoeffding 
decomposition ([H]), we can write T„(A) as 

/ [An] n 

\i=l i=[\n] + l 

/ [nX] n [An] 

(47) =—^\[n-[n\])Y,hi{X,) + [n\] J] h,{X,) + Y. E ^(^-^^O 

y j=l j=[n\]+l J=l j=[\n]+l 

By Proposition 15. ![ we know that 

[An] n 



1 



n-^'^ o<A<i 



i=l j=[An]-fl 







in probabihty. Thus, by Slutsky's lemma, it suffices to show that the sum of the first two 
terms, i.e. 

\ '=^ J = ['^^]+l / 0<A<1 

converges in distribution to the desired limit process. 

Proof of Theorem \2.(k It remains to show that ( l48l) converges in distribution to aW^'^^X)^ < 
A < 1, where (H^''*^^(A))o<a<i is standard Brownian bridge on [0, 1], and where a^ is defined 
in flT7|) . By antisymmetry of the kernel h{x,y), we obtain that h2{x) = —hi{x). Hence, in 
this case, f l48|) can be rewritten as 

i=l i=[nA]+l «=1 «=1 

By Proposition 2.11 and Lemma 2.15 of Borovkova, Burton and Dehling (2001), the se- 
quence (/ii(Xj))j>i is a 1- approximating functional with approximating constant C^fok. 
Since h\{Xi) is bounded, the L2-near epoch dependence in the sense of Wooldridge and 
White (1988) also holds, with the same constants. Moreover, the underlying process (Z„)„>i 
is absolutely regular, and hence also strongly mixing. Thus we may apply the invariance 
principle in Corollary 3.2 of Wooldridge and White (1988), and obtain that the partial sum 
process 

(49) 

converges weakly to Brownian motion (Vr(A))o<A<i with Var(iy(l)) = a^. The statement 
of the Theorem follows with the continuous mapping theorem for the mapping x(t) i— >■ 
x{t) -tx(l), < t < 1. D 

The proof of Theorem 12.41 requires an invariance principle for the partial sum process 
of M^- valued dependent random variables; see Proposition 16.11 below. For mixing processes, 
such invariance principles have been established even for partial sums of Hilbert space valued 
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random vector, e.g. by Dehling (1983). In this paper, we provide an extension of these results 
to functionals of mixing processes. 

Proposition 6.1. Let (X„)„gN be a 1 — approximating functional of an absolutely regular 
process with mixing coefficients {fi{k)) and let hi{-), h2{-) be bounded 1—Lipschitz functions 
with mean zero. Suppose that the sequences {(3{k))k>o, ('2fc)fc>o o'^^ {4>{ci'k))k>o satisfy 

(50) 5^A;2(/3(A;) + afc + 0(afc))<oo. 

k 

Then, as n —)■ oo, 

/ 0<t<l 

co- 



where {Wi{t),W2{t))Q<t<i is a two-dimensional Brownian motion with mean zero and 
variance E{Wk{s) Wi{t)) = min(s, t)crfe/, for < s,t < 1 with a^^i as defined in flT3|l . 

Proof. To prove ( 15T]) . we need to estabhsh finite dimensional convergence and tightness. 
Concerning finite-dimensional convergence, by the Cramer- Wold device it suffices to show 
the convergence in distribution of a linear combination of the coordinates of the vector 

( [nil] ^ [nil] [ntj] [ntj] 

^ n ^ n 

... Y.h,{x;), — Y,h,{x, 

for = to < ti < . . . < tj < . . . < tfc = 1. Any such linear combination can be expressed as 

k [ntj] 

(53) ^-= Y. {aMX^) + hMX^))^ 



j=l '^ i=[nij_i]+l 



for (aj,bj)^^-^ G M^^. By using the Cramer- Wold device again, the weak convergence of this 
sum is equivalent to the weak convergence of the vector 



-= Y,{ciihi{Xi) + 6i/i2(Xi)), ...,-= Y. i^M^i) + bMX^)), 

"^^ i=l "^^ i=[ntj_i]+l 

n \ 

...,-= Y {akhiiXi) + bkh2iX,d)\ 

^ i=[r!,tfe_i]+l / 

to 

(55) (ai(iyi(ti) - W^{t,)) + 6i(1^2(ti) - W^2(to)), • • • , 

ak{W^{tu) - Wi{tk-i)) + bu{W2{tk) - W2{tu^i))). 
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Since {Xn)n>i is a 1— approximating functional, it can be coupled with a process consisting 
of independent blocks. Given integers L := Ln = [ra^/^] and /„ = [n^/^], we introduce the 
(/, L) blocking (-Bm)m>o of the variables {ajhi{Xi) + bjh2{Xi)) with i = [ntj^i] + 1, . . . , [ntj], 
j = 0, . . . ,k and 



(56) 5, 
and separating blocks 

(57) R 



m{Ln+im—l)ln) 
m-- Yl (ajhiiX,^ + bjh2iXi)) 

i=(^rn-l){L„+l„)+l 



m{Ln+ln) 

i=mL„ + (m-l)l„+l 



By Theorem 17.41 there exists a sequence of independent blocks (-B^) with the same blockwise 
marginal distribution as (-Bm) and such that 

Pi\B„,-B'J<2ai)>l-m-'^c,h 

where «; := (2 X]fcLf«„/3i '^fc ) • We can express the components of our vector (IMl) as a sum 
of blocks 

(58) Y. {aMX^) + bMXi)) 

i=[ntj] + l 



"*j-H 

L+l 



"j + 1 
L+l 



Y Bm.+ Y 5„, + ^(aj/ii(Xi) + 6y/i2(Xi)), 



"^=1x^1+1 



"^=1x^1+1 



R, 



where Rj denotes the set of indices not contained in the blocks. Observe that by the Lemma 
17. II for any set A C {1, ... , n} 



(59) 

and hence 

(60) 



E ( 5^(a,/ii(X,) + 6,/i2(X,)) I < C#A 



E 






<c- 



n 



I^n + Ir. 



-L < Cn^l\ 



so it follows with the Chebyshev inequality that this term is negligible. For the last summand, 
we have that 



(61) 



E I](«j-^i(^.) + ^M^^)) < C2{Ln + In) < Cr?l^. 



R-j 
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Furthermore, we need to show that we can replace the blocks B^ by the independent coupled 
blocks B': 



( 



\ 






\ n ^ _ 



m ^m) 






\ 



> e 



< 



< 



[ L+l J , I—. 



L^I+1 



'^M/3([^]) + «[i^] ^0 



(63) 



as ra — 7- oo by our conditions on the mixing coefficients and approximation constants. Here 
we used that fact that a„ — )■ and thus, for almost all ra G N, 

(62) P{\B^- B'J > en'/'') < P i\B^ - B'J > 2a J . 

With the above arguments the result holds if we show the convergence of 

-'^m' • • • ; / _, ^rn 

Since this vector has independent components, we only need to show the one- dimensional 
convergence, which is a consequence of Theorem 17. 3[ using the summability condition (!50|l . 
We now turn to the question of tightness and show that, for each e and 77, there exist a 5, 
< (5 < 1, and an integer uq such that, for < t < 1, 



(64) 
with 
(65) 



-P ( sup \Yn{s) -Yt\>e]<ri, n > Uq 

\t<s<t+5 



Yn{t) 



a 



1 '"*^ 1 

-^ V hi{Xi) + (nt - [nt])^^Xint]+i 



(/12 can be treated in the same way) and by Theorem 17. 8[ this condition reduces to: For each 
positive e there exist a a > 1 and an integer no, s. t. 



(66) 



P I max 

i<n 



i=i 
Let t > s, s,t E [0, 1]. By Lemma fL2\ we get 



> Av^ < 



p, n> no. 



[nt] 



E 

(67) 

and this implies 

(68) 



^^/.i(X,)-^5^/^i(X. 



^^/.i(X,)-^5^/ii(X. 
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By Theorem 17.71 



(69) P max 

i<n 






and we get the assertion. In this way, we have estabhshed tightness of each of the two 
coordinates of the partial sum process. This also implies tightness of the vector-valued 
process. D 

Proof of Theorem \2.4\ From Proposition 16.11 we obtain that 

\ / 0<A<1 

in distribution on the space (-D([0, 1]))^. We consider the functional given by 

(71) (^j{j|).^(l-t)xi(t)+t(x2(l)-X2(t)), 0<t<l. 

This is a continuous mapping from (D[0, 1])^ to D[0,1], so we may apply the continuous 
mapping theorem to (1701) . and obtain 

\n\] 



(^|".»)+i;i E "=w. 



j = [n\]+l / p^^^^ 

-^ ((1 - \)W,{\) + XiW^il) - W^2(A)))o<,<i . 
Together with the remarks at the beginning of this section, this proves Theorem 12. 4[ D 

7. Appendix: Some Auxiliary Results from the Literature 

In this section, we collect some known lemmas and theorems for weakly dependent data. 
We start with some results on the behaviour of partials sums: 

Lemma 7.1 (Lemma 2.23 [B]). Let {Xk)k^i be a 1— approximating functional with constants 
{O'k)k>o of an absolutely regular process with mixing coefficients {(3{k))k>o- Suppose moreover 
that EXi = and that one of the following two conditions holds: 

(1) Xq is bounded a.s. and J2T=o(^k + f^{k)) < oo. 

(2) E\Xo\^+^ < oo and Er=o(«r' + P^i^)) < oo. 
Then, as N ^ oo, 

^ oo 

(72) -ESl -^ EX', + 2 J2 E{XoX,) 

j=i 
and the sum on the r.h.s. converges absolutely. 

Lemma 7.2 (Lemma 2.24 [3]). Let {Xk)k£i be a 1— approximating functional with constants 
(flfe) of an absolutely regular process with mixing coefficients {f3{k))k>o- Suppose moreover 
that EXi = and that one of the following two conditions holds: 

(1) Xq is bounded a.s. and ^^q k'^ia^ + /3{k)) < oo. 

(2) E\Xo\^+' < oo and Y.Zok^af' + /3^(A:)) < oo. 
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Then there exits a constant C such that 

(73) EStf < CN^. 

Theorem 7.3 (Theorem 4 [3]). Let {Xk)k&z be a 1 — approximating functional with constants 
{O'k)k>o of an absolutely regular process with mixing coefficients {(3{k))k>o- Suppose moreover 
that EXi = 0, E|Xo|^+'' < oo and that 

oo 

(74) ^A;2(ar +/34i(A:))<oo, 

fc=0 

for some 6 > 0. Then, as n ^ oo, 



1 " 
(75) yXi^Ar{0,a 



n 

i=l 



where o"^ = EXq + 2 J2JLi E{XqXj). In case a^ = 0, M{0, 0) denotes the point mass at the 
origin. If Xq is bounded, the CLT continues to hold if (|74l) is replaced by the condition that 
EZok'i(^k + P{k))<oo. 

An important tool to derive asymptotic results for weakly dependent data are coupling 
methods, we will need this method to prove the invariance principle (Proposition 16. 11) . 



Theorem 7.4 (Theorem 3 [3]). Let {Xn)n&N be a 1 — approximating functional with summable 
constants {ak)k>o of an absolutely regular process with mixing rate {P{k))k>o- Then given 
integers K,L and N, we can approximate the sequence of {K + 2L,N) — blocks {Bs)s>i by 
a sequence of independent blocks {B'^)s>i with the same marginal distribution in such a way 
that 

(76) Pi\\B, -B:\\< 2aL) > 1 - f3{K) - 2aL, 

where ai := (2 Ya^l ^if^ ■ 

In statistical application, the question of how to estimate a^ is important. In the situation 
when the observations are a functional of a— mixing process, Dehling et al. j9] propose 
the estimation of the variance of partial sums of dependent processes by the subsampling 
estimator 

(„, ^^^ 1 /|'g'it.(«-/„f/„i 



T^m V 2 ^ v^ 






with Tiil) = Y!j=(i-i)i+iFn{Xj) and Un = ^ Zl"=i -^n(^i), where F„(-) is the empirical 
distribution function (e.d.f.). 

Theorem 7.5 (Theorem 1.2 [H]). Let {Xk)k>i be a stationary, 1- approximating functional 
of an a— mixing processes. Suppose that for some 6 > 0, E'lXip^'^ < oo, and that the mixing 
coefficients (afc)fc>i (ii^d the approximation constants {ak)k>i satisfy 

oo oo 

(78) ^{dk)^ < oo, ^{ak)^ < oo. 

fc=i fc=i 

In addition, we assume that F is Lip schitz- continuous, that ak = 0{n^^) and that am = 
0{nr^'^). Then, as n -^ oo, /„ — !■ oo and In = o{y/n), we have Dn — > o in L2. 
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To deal with the degenerate kernel g, we need to find upper bounds for E {g{Xi^, Xj^)g{Xi^, Xj^)), 
in terms of the maximal distance among the indices. Due to 1 < ii < 12 ^ [^A] and 
[n\] + 1 < ji < J2 < n, w.l.o.g. ii<i2< ji < j2- 

Lemma 7.6 (Proposition 6.1 in [8]). Let (X„)„>i be a 1 — approximating functional with 
constants {ak)k>i of an absolutely regular process with mixing coefficients {P{k))k>i and let 
g{x,y) be a 1 — continuous bounded degenerate kernel. Then we have 

(79) |E(^(X,„X,J^(X,„X,J)|<450(a[,/3]) + 85^(7^1^+ /3([fc/3])) 

where S = \ sup^ ,yg{x,y)\ and k = max{i2 — ii.ji — «2, J2 — ji} 

The following two results are useful for proving tightness of a stochastic process. The 
first one is used to control the fiuctuation of maximum. Let C,i, ■ ■ ■ ,^n be random variables 
(stationary or not, independent or not). We denote hj Sk = C,i + ■ ■ ■ + ^k {So = 0), and put 
Mn = maxo<fc<n |5'fc|. 

Theorem 7.7 (Theorem 10.2 [2]). Suppose that /3 > and a > 1/2 and that there exist 
nonnegative numbers Ui, . . . ,Un such that for all positive X 

( \ ^° 

(80) p{^\s^-S,\>\)<^AY.A ' 0<^<J<^ ' 

then for all positive A 



^1) P (M„ > A) < 




A4/3 _ 

^0<«<n 



where Kp^a is a constant depending only on [i and a. 
Theorem 7.8 (Theorem 8.4 [2]). The sequence {Yn}, defined by 

(82) Yn{t) = -^Synt] + {nt - [nt])^iynt\+i 

is tight if for each e > there exist a A > 1 and a rig G N such that for n > Uq 

(83) P I max \Sk+i - Sk\ > \a^/n I < — . 

\ t<n J A 
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